99 research outputs found

    Early History of Mammals Is Elucidated with the ENCODE Multiple Species Sequencing Data

    Get PDF
    Understanding the early evolution of placental mammals is one of the most challenging issues in mammalian phylogeny. Here, we addressed this question by using the sequence data of the ENCODE consortium, which include 1% of mammalian genomes in 18 species belonging to all main mammalian lineages. Phylogenetic reconstructions based on an unprecedented amount of coding sequences taken from 218 genes resulted in a highly supported tree placing the root of Placentalia between Afrotheria and Exafroplacentalia (Afrotheria hypothesis). This topology was validated by the phylogenetic analysis of a new class of genomic phylogenetic markers, the conserved noncoding sequences. Applying the tests of alternative topologies on the coding sequence dataset resulted in the rejection of the Atlantogenata hypothesis (Xenarthra grouping with Afrotheria), while this test rejected the second alternative scenario, the Epitheria hypothesis (Xenarthra at the base), when using the noncoding sequence dataset. Thus, the two datasets support the Afrotheria hypothesis; however, none can reject both of the remaining topological alternatives

    Non-alignment comparison of human and high primate genomes

    Full text link
    Compositional spectra (CS) analysis based on k-mer scoring of DNA sequences was employed in this study for dot-plot comparison of human and primate genomes. The detection of extended conserved synteny regions was based on continuous fuzzy similarity rather than on chains of discrete anchors (genes or highly conserved noncoding elements). In addition to the high correspondence found in the comparisons of whole-genome sequences, a good similarity was also found after masking gene sequences, indicating that CS analysis manages to reveal phylogenetic signal in the organization of noncoding part of the genome sequences, including repetitive DNA and the genome "dark matter". Obviously, the possibility to reveal parallel ordering depends on the signal of common ancestor sequence organization varying locally along the corresponding segments of the compared genomes. We explored two sources contributing to this signal: sequence composition (GC content) and sequence organization (abundances of k-mers in the usual A,T,G,C or purine-pyrimidine alphabets). Whole-genome comparisons based on GC distribution along the analyzed sequences indeed gives reasonable results, but combining it with k-mer abundances dramatically improves the ordering quality, indicating that compositional and organizational heterogeneity comprise complementary sources of information on evolutionary conserved similarity of genome sequences

    A custom capture sequence approach for oculocutaneous albinism identifies structural variant alleles at the OCA2 locus

    Get PDF
    Oculocutaneous albinism (OCA) is a heritable disorder of pigment production that manifests as hypopigmentation and altered eye development. Exon sequencing of known OCA genes is unsuccessful in producing a complete molecular diagnosis for a significant number of affected individuals. We sequenced the DNA of individuals with OCA using short-read custom capture sequencing that targeted coding, intronic and non-coding regulatory regions of known OCA genes and GWAS-associated pigmentation loci. We identified an OCA2 complex structural variant (CxSV), defined by a 143kb inverted segment reintroduced in intron 1, upstream of the native location. The corresponding CxSV junctions were observed in 11/390 probands screened. The 143kb CxSV presents in one family as a copy number variant (CNV) duplication for the 143kb region. In the remaining 10/11 families, the 143kb CxSV acquired an additional 184kb deletion across the same region, restoring exons 3–19 of OCA2 to a copy-number neutral state. Allele-associated haplotype analysis found rare SNVs rs374519281 and rs139696407 are linked with the 143kb CxSV in both OCA2 alleles. For individuals in which customary molecular evaluation does not reveal a biallelic OCA diagnosis, we recommend preliminary screening for these haplotype-associated rare variants, followed by junction-specific validation for the OCA2 143kb CxSV

    Gene-Specific Substitution Profiles Describe the Types and Frequencies of Amino Acid Changes during Antibody Somatic Hypermutation

    Get PDF
    Somatic hypermutation (SHM) plays a critical role in the maturation of antibodies, optimizing recognition initiated by recombination of V(D)J genes. Previous studies have shown that the propensity to mutate is modulated by the context of surrounding nucleotides and that SHM machinery generates biased substitutions. To investigate the intrinsic mutation frequency and substitution bias of SHMs at the amino acid level, we analyzed functional human antibody repertoires and developed mGSSP (method for gene-specific substitution profile), a method to construct amino acid substitution profiles from next-generation sequencing-determined B cell transcripts. We demonstrated that these gene-specific substitution profiles (GSSPs) are unique to each V gene and highly consistent between donors. We also showed that the GSSPs constructed from functional antibody repertoires are highly similar to those constructed from antibody sequences amplified from non-productively rearranged passenger alleles, which do not undergo functional selection. This suggests the types and frequencies, or mutational space, of a majority of amino acid changes sampled by the SHM machinery to be well captured by GSSPs. We further observed the rates of mutational exchange between some amino acids to be both asymmetric and context dependent and to correlate weakly with their biochemical properties. GSSPs provide an improved, position-dependent alternative to standard substitution matrices, and can be utilized to developing software for accurately modeling the SHM process. GSSPs can also be used for predicting the amino acid mutational space available for antigen-driven selection and for understanding factors modulating the maturation pathways of antibody lineages in a gene-specific context. The mGSSP method can be used to build, compare, and plot GSSPs1; we report the GSSPs constructed for 69 common human V genes (DOI: 10.6084/m9.figshare.3511083) and provide high-resolution logo plots for each (DOI: 10.6084/m9.figshare.3511085)

    Genetic effects on liver chromatin accessibility identify disease regulatory variants

    Get PDF
    Identifying the molecular mechanisms by which genome-wide association study (GWAS) loci influence traits remains challenging. Chromatin accessibility quantitative trait loci (caQTLs) help identify GWAS loci that may alter GWAS traits by modulating chromatin structure, but caQTLs have been identified in a limited set of human tissues. Here we mapped caQTLs in human liver tissue in 20 liver samples and identified 3,123 caQTLs. The caQTL variants are enriched in liver tissue promoter and enhancer states and frequently disrupt binding motifs of transcription factors expressed in liver. We predicted target genes for 861 caQTL peaks using proximity, chromatin interactions, correlation with promoter accessibility or gene expression, and colocalization with expression QTLs. Using GWAS signals for 19 liver function and/or cardiometabolic traits, we identified 110 colocalized caQTLs and GWAS signals, 56 of which contained a predicted caPeak target gene. At the LITAF LDL-cholesterol GWAS locus, we validated that a caQTL variant showed allelic differences in protein binding and transcriptional activity. These caQTLs contribute to the epigenomic characterization of human liver and help identify molecular mechanisms and genes at GWAS loci

    Initial Sequence and Comparative Analysis of the Cat Genome

    Get PDF
    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence

    Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

    Get PDF
    Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events

    Whole-Exome Sequencing Identifies Homozygous AFG3L2 Mutations in a Spastic Ataxia-Neuropathy Syndrome Linked to Mitochondrial m-AAA Proteases

    Get PDF
    We report an early onset spastic ataxia-neuropathy syndrome in two brothers of a consanguineous family characterized clinically by lower extremity spasticity, peripheral neuropathy, ptosis, oculomotor apraxia, dystonia, cerebellar atrophy, and progressive myoclonic epilepsy. Whole-exome sequencing identified a homozygous missense mutation (c.1847G>A; p.Y616C) in AFG3L2, encoding a subunit of an m-AAA protease. m-AAA proteases reside in the mitochondrial inner membrane and are responsible for removal of damaged or misfolded proteins and proteolytic activation of essential mitochondrial proteins. AFG3L2 forms either a homo-oligomeric isoenzyme or a hetero-oligomeric complex with paraplegin, a homologous protein mutated in hereditary spastic paraplegia type 7 (SPG7). Heterozygous loss-of-function mutations in AFG3L2 cause autosomal-dominant spinocerebellar ataxia type 28 (SCA28), a disorder whose phenotype is strikingly different from that of our patients. As defined in yeast complementation assays, the AFG3L2Y616C gene product is a hypomorphic variant that exhibited oligomerization defects in yeast as well as in patient fibroblasts. Specifically, the formation of AFG3L2Y616C complexes was impaired, both with itself and to a greater extent with paraplegin. This produced an early-onset clinical syndrome that combines the severe phenotypes of SPG7 and SCA28, in additional to other “mitochondrial” features such as oculomotor apraxia, extrapyramidal dysfunction, and myoclonic epilepsy. These findings expand the phenotype associated with AFG3L2 mutations and suggest that AFG3L2-related disease should be considered in the differential diagnosis of spastic ataxias
    corecore